Two sigma
电话面试:
- Process vs Thread
A process is an executing instance of an application A thread is a path of execution within a process. Also, a process can contain multiple threads
Another difference between a thread and a process is that threads within the same process share the same address space, whereas different processes do not. This allows threads to read from and write to the same data structures and variables, and also facilitates communication between threads. Communication between processes – also known as IPC, or inter-process communication – is quite difficult and resource-intensive.
IPC-- inter-process communication 每个进程都有自己的一部分独立的系统资源, 彼此是隔离的。 为了使不同的进程互相访问资源并进行协调工作 ,才有了进程间通信。
实现用 Java RMI framework 实现 sockets, message queues, pipes What you are looking for is inter-process communication. Java provides a simple IPC framework in the form of Java RMI API. There are several other mechanisms for inter-process communication such as pipes, sockets, message queues (these are all concepts, obviously, so there are frameworks that implement these).
I think in your case Java RMI or a simple custom socket implementation should suffice.
- Latency vs Throughout
Latency is the time required to perform some action or to produce some result. Latency is measured in units of time -- hours, minutes, seconds, nanoseconds or clock periods.
Throughput is the number of such actions executed or results produced per unit of time. This is measured in units of whatever is being produced (cars, motorcycles, I/O samples, memory words, iterations) per unit of time. The term "memory bandwidth" is sometimes used to specify the throughput of memory systems.
example: The latency is: 8 hours. The throughput is: 120 cars / day or 5 cars / hour.
Hashtable :
HashMap Collision:
Sort:
design pattern:
客户反映网站变慢 client-server-database, 分三部分进行分析 , 客户端, 服务器, 数据库
1) is it the network ? ( i.e. from client to server)
1, Is the request not hitting cache of edge server in the CDN (Content delivery network)? Or Cache expired ?
if this case, it will be slower since we have to reach the origin server.
2, Check DNS response time (it should not be longer than 150ms or so)
3, If DNS is quick, next will send TCP SYN packet , compare this connection establish time with network roundtrip time. (TCP three way handshake: SYN->SYN-ACK->ACK, then and only then we can transmit data)
- Next, check the time diff between SYN sent from client and SYN-ACK sent back from server.
- Once TCP connection is established, client will request data from the server, check the time diff of
server receiving the request and server actually sends back response. (if this time is high, then it is server's problem) (retransimission will happen if packet is lost, then this is a network problem)
判断是网络还是server的问题,如果packet丢了,网路问题,如果三次握手,发回去,然后发回来慢,server问题
- server 端 is it server being slow? 1, check I/O wait , if wait is the high, may due to many users area accessing the server at the same time; 2, if CPU wait is low , we can rule out dis access.
(5) Given a stream of numbers, 怎么实时计算average, standard deviation, median?. 鍥磋鎴戜滑@1point 3 acres Followup: 怎么找出largest(smallest) K% elements?
power of 4 given an integer , write a function to check whether it is a power of 4
version1: naive solution
public class Solution {
public boolean isPowerOfFour(int num) {
while(num > 0) {
if(num == 1) {
return true;
}
if(num % 4 != 0) {
return false;
}else{
num /= 4;
}
}
return false;
}
}
version2 : bit manipulation
Remaining:
Behavior:
Culture: Bigger impact, newer/better technology, more responsbilities, work with smart people. Software Engineering is an industry of continous learning, and so should I, also I noticed that you have a learning culture here, a place of learning. Rather than many financial companies that keep technology as a back office, I'm looking for something that is tech-driven. I always believe that technology is the key on the road to successful strategy. life is about making decisions, it is like investments. We're downtown, but we're not Wall Street.
How it works: look for meaningful patterns in the world’s data. Then we use these insights to create investment strategies. Pattern. Meaning. Relationships. Power of data and technology. Predict the movement of the market (retirement investment).
下午: http://www.geeksforgeeks.org/online-algorithm-for-checking-palindrome-in-a-stream/
//然后问了client-server-database这种架构中,客户反映网站变慢。怎么确定是哪儿的问题。分三部分,客户端,服务器,数据库。 ///
- A front-end web server talks with an application server that talks with a middleware server that queries one or more database servers
- Then, all of those servers all might talk with DNS servers (DNS can time out requests) to look up IP addresses or map them back to server names
///
In general, these two parts: 1) network latency, the time it takes for the packets to traverse the network; 2) server delay, the time it takes for the server to process that application request.
In detail 1) Is it the network? (i.e. from client to sever)
- Is the request not hitting cache of edge server in the CDN? Or Cache expired? If this case, it will be slower since we have to reach the origin server. Depends on how CDN is set up, what optimization strategy is used?
- Check DNS response time (it should not be longer than 150ms or so)
- If DNS is quick, next will send TCP SYN packet, compare this connection establish time with network roundtrip time. (TCP three way handshake: SYN->SYN-ACK->ACK, then and only then we can transmit data)
- Next, check the time diff between SYN sent from client and SYN-ACK sent back from server.
- Once TCP connection is established, client will request data from the server, check the time diff of server receiving the request and server actually sends back response. (if this time is high, then it is server's problem) (retransimission will happen if packet is lost, then this is a network problem)
2) Is it server being slow? (Flow Chart: https://dl.dropboxusercontent.com/u/18554/slow%20server%20troubleshooting.png)
- Check I/O("wa"), If wait is high, may due to many users are accessing the server at the same time;
- If CPU wait is low, we can rule out disk access. Anything above 10% I/O wait should be considered high.
- Check CPU idle time, If your idle time is consistently above 25%, consider it "high enough".
- If I/O wait is low and idle time is low, this means some process is consuming large percent of CPU, use top to identify the process.
- If I/O wait is low and idle time is high (this is good), so it's not due to I/O or CPU issues, slowness is mostly caused by application level (deadlocks?).
- Check memory usage (top -> M), if there is any memory leaks
3) Database slowness:
1. Query is not well formed, Database structure is not well defined/normalized
2. Missing cache of frequntly used queries, etc.
3. Number of rows in the table too large
4. Connections are not being pooled (i.e. creating and destroying connection everytime),
or connections are not closed/returned to pool in case of exception
5. Lacking of stored procedure (pre compiled SQL queries)
6. Missing index/Poor index design
Other: CDN: the CDN smartly returns the best possible IP address to handle the DNS resolution request (geographically). ♦The server is encountering error conditions. ♦Dependent front end or middleware, database, server or other systems are causing delays.
下午:
- // http://www.geeksforgeeks.org/online-algorithm-for-checking-palindrome-in-a-stream/
// leetcode: ugly number II: https://leetcode.com/problems/ugly-number-ii/ int nthUglyNumber(int n) {
if (n <= 0) return -1; vector<int> result = {1}; int i = 0; // pointer for 2 int j = 0; // pointer for 3 int k = 0; // pointer for 5 while(result.size() < n) { int next = min(result[i] * 2, min(result[j] * 3, result[k] * 5)); result.push_back(next); if (next == result[i] * 2) i ++; if (next == result[j] * 3) j ++; if (next == result[k] * 5) k ++; } return result.back();
}
// - 给你两个independent queue,每个queue都存着timestamp,只能有getNext()来取queue里面的timestamp,每个timestamp只能被取一次,比较这两个queue里的timestamp,如果差值<1,print这两个timestamp。例如: //Q1 0.2, 1.4, 3.0 //Q2 1.0 1.1, 3.5 //output: (0.2, 1.0), (1.4, 1.0), (0.2, 1.1), (1.4, 1.1), (3.0, 3.5) //最后用了两个thread,两个thread-safe queue来实现. 每次call getNext(), 和另一个queue里存的每个item相比, 相差大于一就删除, 否则输出 //https://www.ibm.com/developerworks/cn/aix/library/au-multithreaded_structures1/
include
include
include
using namespace std;
// this is the final result containing timestamp pairs < 1
vector
// this is a utility function
void calculate(list
// function that will be executed by thread 1 void task1(Stream s) { while (true) { double val = s.getNext(); mutex.lock(); calculate(Q1, Q2, val); mutex.unlock(); } }
// function that will be executed by thread 2 void task2(Stream s) { while (true) { double val = s.getNext(); mutex.lock(); calculate(Q2, Q1, val); mutex.unlock(); } }
int main() { // incoming streams Stream S1 = {0.2, 1.4, 3.0}; Stream S2 = {2.0, 2.1, 4.5};
// create the threads
thread t1(task1, S1);
thread t2(task2, S2);
// start the threads
t1.join();
t2.join();
}
- // Google Doc 这种多人同时编辑的,怎么让每个人看到的都是update的version //http://stackoverflow.com/questions/5772879/how-do-you-write-a-real-time-webbased-collaboration-tool-such-as-google-docs Operational transformation Instead of editing the doc directly, send operations to server and let server decide the final/combined state.
//----------------------text editor设计题----------------- //insert(p), delete(p1, p2), highlight(p1, p2),redo/undo, save/load update, search //text editor需要insert,remove,highlight,需要想办法去index每次插入的object,原po说的interval tree应该就是index的方式吧。 //关键点在于text打算怎么存 //store highlight? //他要求三天后再load这个text,需要可以undo三天前的操作. save的时候 保存成xml类型之类的 把之前的操作也一起存下来 struct Node { string text; int size; // if Node is leaf, size = text.size(); if not leaf, size = left.size + right.size Node left; // if no left child, this node is leaf Node right;
Node(string _text = "", int _size = 0) : text(_text), size(_size), left(NULL), right(NULL) {} };
class TextEditor { public: TextEditor() : root(NULL) { }
void initialize(const string &str) {
root = build(str);
}
char index(int i) {
return index(root, i);
}
private:
Node * root;
stack
Node * build(const string str) {
int n = str.size();
if (n <= 0)
return NULL;
if (n <= 2)
return new Node(str, str.size()); // creating leaf node
Node * node = new Node();
node->left = build(str.substr(0, n / 2));
node->right = build(str.substr(n/2, n - n/2));
node->size = str.size();
return node;
}
char index(Node * node, int i) {
if (!node || i >= node->size) return 'x';
if (!node->left) { // is leaf
return node->text[i];
}
if (i < node->left->size)
return index(node->left, i);
return index(node->right, i - node->left->size);
}
};
/**/
// - power of four // 解法1 bool isPowerOfFour(int num) { return !(num & (num -1)) && // power of 2 (num % 3 == 1); // 3n + 1 } // 解法2, long int bool isPowerOfFour(long num) { return !(num & (num -1)) && // power of 2 (num & 0x5555555555555555l); // 5 = 0101, power of fours must have 1 at every other 2 bits } // 解法3,recursive bool isPowerOfFour(int num) { if (num <= 0) return false; if (num == 1) return true; return !(num % 4) && isPowerOfFour(num / 4); }
// - wild card https://leetcode.com/problems/wildcard-matching/ // test case: empty str, without or ?, with , with ?, with and ? / t[i][j] := whether s[0..i-1] matchs p[0..j-1], i = [0 : s.size()], j = [0, p.size()]
1. if (s[i-1] == p[j-1] || p[j-1] == '?') -> t[i][j] = t[i-1][j-1]
2. else if (p[j-1] == '*') -> t[i][j] = t[i-1][j] || t[i][j-1]
3. else (i.e. s[i] != p[j-1]) -> t[i][j] = false
*/
bool isMatch(string s, string p) {
int m = s.size();
int n = p.size();
vector<vector<bool> > t(m + 1, vector<bool>(n + 1)); // default to all false
// initialize first column and row
t[0][0] = true; // s and p are both ""
for (int j = 1; j <= n; ++ j) {
if (p[j-1] == '*')
t[0][j] = true;
else
break; // if current is not '*', will not match onwards
}
for (int i = 1; i <= m; ++ i) { // note the bound is "<= n""
for (int j = 1; j <= n; ++ j) {
if (s[i-1] == p[j-1] || p[j-1] == '?') // if current char match
t[i][j] = t[i-1][j-1];
else if (p[j-1] == '*')
t[i][j] = t[i-1][j] || t[i][j-1];
// else, no match, leave as false
}
}
return t[m][n];
}
// 1D Array
bool isMatch(const char *s, const char *p) {
int m = strlen(s), n = strlen(p);
vector<bool> dp(m+1, false);
for(int i=0; i<m; i++) {
bool diag = dp[0];
dp[0] = i==0 ? true : false;
for(int j=1; j<n; j++) {
int temp = dp[j];
if(p[j-1]==s[i-1] || p[j-1]=='?')
dp[j] = i>0 && diag;
else if(p[j-1]=='*')
dp[j] = dp[j-1] || (i>0 && (diag || dp[j]));
diag = temp;
}
}
return dp[n];
}
// better // Greedy // use example: s = "abedc", p = "?b*c" -> true bool isMatch(string s, string p) { int m = s.size(), n = p.size(); int i = 0, j = 0, lastAsterisk = -1, lastMatch = -1; // lastMatch := last asterisk match
while (i < m) {
if (j < n && (s[i] == p[j] || p[j] == '?')) { // current is a match
i ++;
j ++;
continue;
}
if (j < n && p[j] == '*') {
lastAsterisk = j;
lastMatch = i;
j ++; // only j ++ when p[j] = '*'
continue;
}
// no match
if (lastAsterisk == -1)
return false;
else {
j = lastAsterisk + 1; // position of lastAsterisk stay the same
lastMatch ++;
i = lastMatch;
}
}
// check if remaining p[j] are all *'s
while (j < n) {
if (p[j] != '*')
return false;
j ++; // remember to increment j
}
return true;
}
// - remove subtree from index array tree
include
include
using namespace std;
struct Node { int parentIdx; bool valid; bool visited;
Node(int pi) : parentIdx(pi), valid(true), visited(false) {}
};
class ArrayTree {
public:
ArrayTree(vector
void deleteSubtree(int idx) {
if (!nodes[idx]->valid) // if already deleted
return;
reset(); // reset valid nodes back to unvisited
nodes[idx]->valid = false;
nodes[idx]->visited = true;
size --;
for (int i = 0; i < nodes.size(); ++ i) {
if (nodes[i]->visited)
continue;
explore(i);
}
}
void print() {
cout << "tree size: " << size << endl;
for (int i = 0; i < nodes.size(); ++ i) {
cout << "\tnode idx: " << i << ", parent idx: " << nodes[i]->parentIdx
<< ", visited: " << nodes[i]->visited << ", valid: " << nodes[i]->valid << endl;
}
}
private:
vector
bool explore(int idx) {
// if current is root or is already visited
if (nodes[idx]->parentIdx == -1 || nodes[idx]->visited) {
nodes[idx]->visited = true;
return nodes[idx]->valid;
}
nodes[idx]->visited = true;
// current validness depends on parent validness
bool isParentValid = explore(nodes[idx]->parentIdx);
if (nodes[idx]->valid != isParentValid) {
nodes[idx]->valid = isParentValid;
size --; // only decrement size when change from valid to invalid
}
return isParentValid;
}
void reset() {
for (Node * n : nodes) {
if (n->valid)
n->visited = false;
}
}
};
int main() {
vector<int> nums = {1, 5, 5, 2, 2, -1, 3};
ArrayTree at(nums);
at.print();
at.deleteSubtree(3);
at.print();
at.deleteSubtree(6);
at.print();
at.deleteSubtree(2);
at.print();
return 0;
}
// - game of life 1D
void gameOfLife1D(vector
for (int i = 0; i < board.size(); ++ i) {
int liveCount = (board[(i + n - 1) % n] & 1) + (board[(i + n + 1) % n] & 1);
if (liveCount == 1) { // change status
if (board[i] & 1) board[i] = 1; // 0 <- 1
else board[i] = 2; // 1 <- 0
} else {
if (board[i] & 1) board[i] = 3; // 1 <- 1
else board[i] = 0; // 0 <- 0
}
}
for (int i = 0; i < board.size(); ++ i)
board[i] >>= 1; // right shift 1 to get only new status
}
// - game of life 2D
// follow up: https://segmentfault.com/a/1190000003819277
void gameOfLife2D(vector
if (board[i][j] & 1) { // current is live (i.e. lower bit is 1)
if (liveCount == 2 || liveCount == 3)
board[i][j] = 3; // 1 <- 1
else
board[i][j] = 1; // 0 <- 1
} else { // current is dead
if (liveCount == 3)
board[i][j] = 2; // 1 <- 0
else
board[i][j] = 0; // 0 <- 0
}
}
}
for (int i = 0; i < board.size(); ++ i) {
for (int j = 0; j < board[0].size(); ++ j) {
board[i][j] >>= 1; // right shift 1 to get only new status
}
}
}
// - Reverse Polish Notation // use XML/JSON serialization to allow client side configuration, e.g. add a new operator
include
include "print.h"
include
include
include
include
include
using namespace std;
class Token {
public:
Token() {};
virtual ~Token() = 0;
virtual bool isOperator() const = 0;
virtual void process(stack
class Operand : public Token {
public:
Operand(const string &str) {
m_val = stod(str);
}
double val() const { return m_val; }
virtual bool isOperator() const { return false; }
virtual void process(stack
class Operator : public Token { public: Operator(const string &type) { m_type = type; } const string & type() const { return m_type; } virtual ~Operator() = 0; virtual bool isOperator() const { return true; } private: string m_type; }; Operator::~Operator() {}
class Add : public Operator {
public:
Add() : Operator("+") {}
virtual void process(stack
class Sub : public Operator {
public:
Sub() : Operator("+") {}
virtual void process(stack
class Mul : public Operator {
public:
Mul() : Operator("+") {}
virtual void process(stack
class Div : public Operator {
public:
Div() : Operator("+") {}
virtual void process(stack
class TokenFactory { public: TokenFactory() { operators["+"] = new Add(); operators["-"] = new Sub(); operators["*"] = new Mul(); operators["/"] = new Div(); }
Token * create(const string ¶m) {
if (operators.find(param) != operators.end())
return operators[param];
else
return new Operand(param);
}
private:
unordered_map
class RPNCalculator {
public:
double calculate(const vector
double result = m_stk.top();
m_stk.pop();
if (!m_stk.empty()) {
cout << "invalid formula. " << endl;
return -1;
}
return result;
}
private:
TokenFactory m_factory;
stack
int main() {
vector
return 0;
}
// - mode 5 iterator
class Iterator {
public:
Iterator(const vector
class ModFiveIterator : public Iterator {
public:
ModFiveIterator(const vector
// Returns the next element in the iteration.
int next() {
if (!hasNext())
throw runtime_error("no more elements!");
return nextEle;
}
// Returns true if the iteration has more elements.
bool hasNext() {
while (Iterator::hasNext()) {
int n = Iterator::next();
if (n % 5 == 0) {
nextEle = n;
return true;
}
}
return false;
}
void remove(){
Iterator::remove();
}
private: int nextEle; };
// - given two sets, mathmatical sets not binding to any programing languages, brain storm how all ways find the intersetion of the two sets //http://segmentfault.com/a/1190000003836386
- HashSet time: O(m + n) space: O(n) or O(m), so should put the smaller set into hashset
- Sort everything, then check consecutive duplicates time: O((m+n)log(m+n)) space: O(m+n)
- Sort one set, do binary search of each element in the other set time: sort n, O(nlogn) + O(mlogn) = O((m+n)logn); or sort m, O(mlogm) + O(nlogm) = O((m+n)logm), so should sort the smaller set space: O(1)
- Sort both, but separately, then use two pointers, if two elements equal, put element into result and advance both pointer; else, advance the smaller's pointer time: O(nlogn) + O(mlogm) space: O(1), if can be sort in place
// --------------------Junit Test---------------------------------------
// https://code.google.com/p/guava-libraries/source/browse/guava-tests/test/com/google/common/collect/?r=9113082b77c6f80e0d4d80b24eb16bcee1a23a08
// https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/AbstractMapBasedMultimap.java
abstract class AbstractMapBasedMultimap
/**
Removes all values for the provided key. */ private void removeValuesForKey(Object key) { Collection
collection = Maps.safeRemove(map, key); if (collection != null) { int count = collection.size(); collection.clear(); totalSize -= count; } }
// get public Collection
get(@Nullable K key) { Collection collection = map.get(key); if (collection == null) { collection = createCollection(key); } return wrapCollection(key, collection); }